Benchmarking RNA secondary structures comparison algorithms
نویسندگان
چکیده
In the last ten years, several tools have been proposed for RNA secondary structure pairwise comparison. These tools use different models (ordered tree or forest, arc annotated sequence, multi-level tree) and methods (edit distance, alignment). We present a first online benchmark for comparing these tools. For various RNA families, we built two sets of secondary structures. The first, called the reference set, is composed of a small number of RNAs with their known structures. The second is composed of sequences folded using Mfold and RNAshapes. Some of these sequences correspond to structural RNAs of the same families (true events), others correspond to noise. We studied the ability of each tool to find the true events using the reference set. Tools: RNAforester [1] is an ordered trees local/global alignment algorithm. It uses a special tree encoding that allows to break nucleotide pairings under certain conditions. MiGaL [2] uses a multi-level representation of the secondary structure composed by four layers coded by rooted ordered trees. The layers model different structural levels from multiloop network to the sequence of nucleotides composing the RNA. The algorithm is an adapted edit distance successively applied to each layer. (options: -M -hairpin-strict --indel-once) TreeMatching [3] is based on a quotiented tree representation of the secondary structure which is a similar structure made of two rooted ordered trees at two different scales (nucleotides and structural elements). The core of the method relies on the comparison of both scales simultaneously: it computes an edit distance between quotiented trees at the macroscopic scale using edit costs defined as edit distances between subtrees at the microscopic scale. gardenia [4] and NestedAlign [5] use an arcannotated based representation, that allows for complex edit operations, such as arcbreaking or arc-altering. They allow local and global alignment features. Gardenia notably allows affine gap scores while NestedAlign implements an original local alignment algorithm. RNAStrAT[6] performs the comparison in two steps. First, it compares stems of the two structures using an alignment algorithm with complex edit operations. Then it finds an optimal mapping between the different stems. RNAdistance[7] implements a classical edit distance on a tree representation of the structure. A particularity of RNAdistance is that it does not take into account the RNA sequence. We also compute the score using blast [8] (bl2seq -t blastn -W 4). [1] M. Höchsmann, T. Töller, R. Giegerich, S. Kurtz Local Similarity in RNA Secondary Structures, Proceedings of the IEEE Bioinformatics Conference 2003 [2] J. Allali and M-F. Sagot A multiple layer model to compare RNA secondary structures Software: Practice and Experience 2007 (online) [3] A. Ouangraoua, P. Ferraro, L. Tichit, S. Dulucq Local similarity between quotiented ordered trees, Journal of Discrete Algorithms 2007 [4] G. Blin and H. Touzet How to compare arc-annotated sequences: The alignment hierarchy. SPIRE 2006 [5]C. Herrbach Etude algorithmique et statistique de la comparaison de structures secondaires d'ARN. Thesis 2007 [6] V. Guignon, C. Chauve , S. Hamel An edit distance between RNA stem-loops. SPIRE 2005 [7] I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker, P. Schuster Fast Folding and Comparison of RNA Secondary Structures Monatshefte f. Chemie 1994 [8] S.F. Altschul, W. Gish, W. Miller, E. W. Myers,D. J. Lipman Basic local alignment search tool Journal of Molecular Biologie 1990. Protocol: For each run, two sets of RNA secondary structures are built. The reference set is composed of 4 to 6 RNAs of a same family using the structures provided in the literature. The data set is composed of structures obtained by folding sequences of RNAs of the same family (true events) and sequences of the same length as the references but supposed not to belong to that family (called noise or false events). REFERENCE SET structures of RNA of type F TRUE EVENTS: sequences of RNA of type F FALSE EVENTS: sequences sampled randomly from a noise source DATA SET: RNA secondary structures F O L D E R S TO O L ... Events ordered by scores: For each event, the best score obtained between the references and all its possible structures (optimal and suboptimal structures found by mfold or rnashape) is retained. Given all events sorted by their best scores, a ROC curve (False Positive Rate; True Positive Rate) is plotted.
منابع مشابه
BRASERO: A Resource for Benchmarking RNA Secondary Structure Comparison Algorithms
The pairwise comparison of RNA secondary structures is a fundamental problem, with direct application in mining databases for annotating putative noncoding RNA candidates in newly sequenced genomes. An increasing number of software tools are available for comparing RNA secondary structures, based on different models (such as ordered trees or forests, arc annotated sequences, and multilevel tree...
متن کاملBenchmarking RNA secondary structure comparison algorithms
Since the last ten years several tools have been proposed for RNA secondary structure pairwise comparison. These tools use different models (ordered tree or forest, arc annotated sequence, multi-level tree) and methods (edit distance, alignment). We present a first benchmark on these tools. For various RNA families, we built two sets of secondary structures. The first, called the reference set,...
متن کاملPreRkTAG: Prediction of RNA Knotted Structures Using Tree Adjoining Grammars
Background: RNA molecules play many important regulatory, catalytic and structural <span style="font-variant: normal; font-style: norma...
متن کاملEfficient drawing of RNA secondary structure
In this paper, we propose a new layout algorithm that draws the secondary structure of a Ribonucleic Acid (RNA) automatically according to some of the biologists’ aesthetic criteria. Such layout insures that two equivalent structures (or sub-structures) are drawn in a same and planar way. In order to allow a visual comparison of two RNAs, we use an heuristic that places the biggest similar part...
متن کاملWidespread purifying selection on RNA structure in mammals
Evolutionarily conserved RNA secondary structures are a robust indicator of purifying selection and, consequently, molecular function. Evaluating their genome-wide occurrence through comparative genomics has consistently been plagued by high false-positive rates and divergent predictions. We present a novel benchmarking pipeline aimed at calibrating the precision of genome-wide scans for consen...
متن کاملA comprehensive comparison of general RNA-RNA interaction prediction methods.
RNA-RNA interactions are fast emerging as a major functional component in many newly discovered non-coding RNAs. Basepairing is believed to be a major contributor to the stability of these intermolecular interactions, much like intramolecular basepairs formed in RNA secondary structure. As such, using algorithms similar to those for predicting RNA secondary structure, computational methods have...
متن کامل